Chromosome-level genome provides insight into the evolution and conservation of the threatened goral (Naemorhedus goral)

Background Gorals Naemorhedus resemble both goats and antelopes, which prompts much debate about the intragenus species delimitation and phylogenetic status of the genus Naemorhedus within the subfamily Caprinae. Their evolution is believed to be linked to the uplift of the Qinghai-Tibet Plateau (QTP). To better understand its phylogenetics, the genetic information is worth being resolved. Results Based on a sample from the eastern margin of QTP, we constructed the first reference genome for Himalayan goral Naemorhedus goral, using PacBio long-read sequencing and Hi-C technology. The 2.59 Gb assembled genome had a contig N50 of 3.70 Mb and scaffold N50 of 106.66 Mb, which anchored onto 28 pseudo chromosomes. A total of 20,145 protein-coding genes were predicted in the assembled genome, of which 99.93% were functionally annotated. Phylogenetically, the goral was closely related to muskox on the mitochondrial genome level and nested into the takin-muskox clade on the genome tree, rather than other so-called goat-antelopes. The cladogenetic event among muskox, takin and goral occurred sequentially during the late Miocene (~ 11 − 5 Mya), when the QTP experienced a third dramatic uplift with consequent profound changes in climate and environment. Several chromosome fusions and translocations were observed between goral and takin/muskox. The expanded gene families in the goral genome were mainly related to the metabolism of drugs and diseases, so as the positive selected genes. The Ne of goral continued to decrease since ~ 1 Mya during the Pleistocene with active glaciations. Conclusion The high-quality goral genome provides insights into the evolution and valuable information for the conservation of this threatened group. Supplementary Information The online version contains supplementary material available at 10.1186/s12864-024-09987-5.


Introduction
Gorals Naemorhedus belongs to the subfamily of Caprinae (Bovidae), resembling both goat and antelopes.According to the International Union for Conservation of Nature (IUCN), goral species are assessed as 'Vulnerable' and 'Near Threatened' due to a declining population caused by poaching and destruction of forested mountain habitats [1][2][3].The distribution and evolution of gorals are believed to be linked to the uplift of the Qinghai-Tibet Plateau (QTP) [4], where the most chaotic assessment of goral phylogenetics is found across the southeastern margin, at the junction of tremendous Himalayas and Hengduan Mountains.However, with the solitary and elusive habit, gorals are poorly known, not separated from serows (Capricornis) as an independent genus until three decades ago [5].Even now, many locals still believe that gorals are the offspring of serows (private survey) due to their similar looking and sympatric distribution.It prompts much debate about the intragenus species delimitation and phylogenetic status of the genus Naemorhedus within the subfamily Caprinae.
Within the genus, seven goral species have been proposed, containing N. goral, N. griseus, N. caudatus, N. baileyi, N. bedfordi, N. evansi and N. cranbrooki, based on morphological and ecological characteristics.Having experienced splitting and lumping, the number of goral species varied from three to six [5][6][7].Recently, in the context of the phylogenetic species concept, five goral species are recognized based on complete mitochondrial genomes [4] and six species using gene markers (i.e.Cyt b, D-loop) [8], due to differences in genetic data integrity.
In the subfamily Caprinae, goral is considered morphologically belonging to the tribe Rupicaprini, so-called goat-antelopes, together with serow from south to east Asia, chamois Rupicapra from Europe to west Asia, and Rocky mountain goat Oreamnos from North America [9].However, a similar topology at the mitochondrial genome level is recovered that goral and serow, as sister genera, are much closer to muskox Ovibos moschatus than chamois and mountain goat, while the later two clustered with takin Budorcas taxicolor in another clade [10][11][12][13][14]. Inconsistently, muskox and takin, formerly as the tribe Ovibovini, are clustered and far from mountain goats on the genome tree [15].The status of goral is still uncertain due to data absence.
The bottlenecks encountered in current research suggest the limited explanatory power of the mitochondrial genome to reconstruct phylogenetic relationships, which is not only the foundation of taxonomy, but also the premise of revealing the mechanism of speciation and evolution [16].To a certain extent, genome trees could be better to reflect the overall evolutionary process of species and be closer to species trees [17].Thus, genomicscale data is necessary.Benefiting from the rapid development of sequencing technology, dozens of reference genomes in subfamily Caprinae have been released in the GenBank database, among which only eight have been assembled at chromosome-level and none are available for goat-antelopes to date (https://www.ncbi.nlm.nih.gov/).It limits us to understand the internal phylogeny and evolution of this important group.In this study, the first genome of goral is assembled at the chromosome level using PacBio sequel II long reads and Hi-C technology.The preliminary bioinformatic analyses are also performed.It will serve as a high-quality reference genome, provide valuable information for further genetic analysis and benefit the evolutionary and conservation studies for goral species.

Phylogenetic analysis at the mitochondrial and genome level
The size of the assembled mitochondrial genome of goral sample MKH012 was about 16,554 bp, sharing the sequence similarity from 98.54% with N. goral (Gen-Bank accession No. KT878720) to 99.19% with N. griseus (GenBank accession No. JX188255) in NCBI database (Table S9).It was identified as Himalayan goral N. goral on the mitochondrial genome tree, a sister species of Long-tailed goral N. caudatus (Fig. 2A).The so-called Chinese goral N. griseus (KF500173 and FJ207532) was clustered with N. goral from Shanxi, instead of being monophyletic (Table S9).Two clades within N. goral were identified: the West clade in the southeastern margin of QTP (Tibet, Qinghai and Sichuan) and the East clade in the eastern China (Shanxi and Hubei) (Fig. 2B).This might indicate two evolutionary significant units under N. goral.Among the Caprinae species, goral clustered with serow and muskox, far from chamois and mountain goat which clustered with takin in another clade on the mitochondrial genome tree (Fig. 2A).
Gene families in goral were clustered with the other 9 Bovidae species on the genome tree (Table S10).Based on the identified orthologous gene sets, 1,661 single-copy orthologous genes were obtained and used for molecular phylogenetic analysis (Fig. 3A).On the reconstructed genome tree, goral clustered with muskox and takin, and was closely related to the latter (Fig. 3B), which was inconsistent with their phylogenetic relationship at the mitochondrial genome level.Molecular dating showed that the ancestor of goral and muskox had split from that of sheep and goat at ~ 10.6 Mya (95% HPD = 7.1-14.2),while the muskox showed a split at ~ 8.3 (95% HPD = 5.6-11.2) Mya from the proto-takin-goral stock.The divergence time between takin and goral was ~ 7.6 (95% HPD = 5.0-10.2) Mya.

Gene family
Compared the identified gene families among 10 Bovidae species on the phylogenetic tree, a total of 786 expanded and 3333 contracted gene families were revealed in the N. goral genome (Fig. 3B).After KEGG and GO enrichment analysis of the expanded orthologous groups, 17 significantly over-represented pathways and 280 significantly enriched GO terms were obtained with an adjusted p-value < 0.05 (Tables S11 and S12).Conducted by the branch-site likelihood ratio test with a p-value < 0.05, 144 positively selected genes (PSGs) in the N. goral lineage were identified (Table S13), 4 KEGG pathways and enriched GO terms were obtained with a significance of p-value < 0.05 (Tables S14 and S15).It should be noticed that the expanded gene families and PSGs with significant KEGG pathways are related to the metabolism of drugs and diseases, similarly to the significantly enriched GO terms (Fig. 4).

Synteny analysis
A one-to-one corresponding relationship was observed between chromosomes of goral and its closest related takin and muskox (Table S16), with reverse complementary sequence in several chromosomes, i.e., cluster 4, also known as chromosome X and some chromosome fusions from muskox to goral and between goral and takin (Fig. 3C).Although at scaffold level, the muskox genome appeared to be nearly complete, correlated gene arrangements provide a valuable framework for inferring shared ancestry of genes and utilization of findings from one organism to another.

Demographic history
PSMC analysis suggested in effective population size (Ne) of goral an increase at the beginning of the Pleistocene, then a continuous decrease after ~ 1 Mya (the Xixiabangma Glaciation) and a significant drop by nearly 80-fold (Fig. 5).S3).Furthermore, the goral genome demonstrated an excellent collinearity connection with muskox and takin, indicating a high-quality assembly.
The assembled mitochondrial genome (16,554 bp) was highly similar to that in the GenBank database, showing There is a possibility that the Jialing River is the distribution boundary (Fig. 2B), which probably distinct the subspecies of takin from Sichuan and Qinling [19].
The traditional tribes within the subfamily Caprinae, neither of the 'goat-antelopes (Rupicarini)' nor the 'ovibovines (Ovibovini)' , are supported by the phylogenetic results in this study, due to the goral-serow-muskox clade far from takin-mountain goat-chamois clade on the mitochondrial genome tree and the goral nested into the takin-muskox clade on the genome tree.Groves and Shields have proposed to dissolve the Ovibovini as early as 1996, to which a similar conclusion has been yielded using mitochondrial DNA resources and widely discussed in the subsequent studies [10][11][12][13][14].These two tribes are continuously used and even borrowed into the field of paleontology, such as the 'late Miocene ovibovines' [20] which is also highly debatable and not closely related to the extant ovibovine, phylogenetically [21].
They are so-called 'rogue taxon' with sensitive phylogenetic positions, thought to destabilize trees and present an obstacle to molecular phylogenetics [22].The explanation deduced by Groves and Shields (1996) that "the rapid radiation of Caprinae has probably resulted in enough homoplasy to reduce the resolving power of this phylogenetic analysis" should be considered attentively [23].
On the genome level, the presence of the goral nests in the clade of takin and muskox provides a new insight into their evolution.It might be linked to the evolution of phenotype, gene family and speciation [24].The split between the proto-goral-muskox-takin and proto-sheepgoat stock is around 10.6 Mya, and that between the muskox and the proto-takin-goral stock at ~ 8.3 is almost identical to the estimation from Li et al. ( 2022) [15].The cladogenetic event among goral, takin and muskox occurred during the late Miocene (~ 11 − 5 Mya), when QTP experienced a third dramatic uplift, the entire plateau and all mountains surrounding the Tibet-Himalaya-Hengduan region had reached their current elevations, with consequent profound changes in environment and biodiversity through the influence of topography on atmospheric precipitation [25,26].Drying and cooling in central and northern Asia led to the transition of climates and ecosystems over large areas [27,28], which might have caused the divergence of muskox restricted in a circarctic distribution, currently.As forest-dwelling species, takin and goral in the mountain-gorge landform in the south and east of the plateau, where the rapid speciation within genus was favored, closely related to the rapid uplift of the QTP and Hengduan Mountains in the uplift-driven diversification hypothesis.Correlated gene arrangements provide a valuable framework for inferring shared ancestry of genes and utilization findings from one organism to another.Nonetheless, at scaffold level, the muskox genome appeared to be nearly complete to analyze the corresponding relationship between chromosomes of goral and its closest related takin and muskox.
During the Pleistocene, the glaciation was active, especially in the middle (1.2-0.7 Mya) with a fundamental climate transition and environment change which have greatly influenced the evolutionary histories and distribution patterns of most extant species.Since the oldest Xixiabangma Glaciation (1.1-0.8Mya), the Ne of goral has continued to decrease, although there was a brief stability (interglaciation, perhaps), it has ultimately dropped by nearly 80-fold.The similar history has been experienced by takin, milu and snob-nosed monkey [15].However, only one biological replicate limits the power of PSMC analysis.It's still not easy to understand why the significantly expanded gene families and PSGs are related to the metabolism of drugs and diseases.For goral, more resequencing replicates and depth analysis are needed to understand the evolutionary history.

Sample collection and sequencing
The sample (MKH012) used in this study was a female goral, identified morphologically, which was naturally dead in March 2021 from Makehe National Nature Reserve, Banma County of Qinghai Province, China.The leg muscle was collected, transported in a car refrigerator (-20℃) and stored in an ultra-cold storage freezer at -80℃ in the College of Life Sciences, Qinghai Normal University.
High-quality DNA was extracted using the Genomic DNA extraction kit (Cat#13323, Qiagen, Germany).DNA's Concentration was detected by Qubit Fluorometer.Sample integrity and purity were detected by Agarose Gel Electrophoresis.After quality assurance, the DNA was randomly fragmented by Covaris and selected by Agencourt AMPure XP-Medium kit to an average size of 200-400 bp and used for BGI-seq library preparation.The selected fragments were through end-repair, 3' adenylated, adapters-ligation, and PCR Amplifying and the products were recovered by the AxyPrep Mag PCR clean up Kit.The double-stranded PCR products were heat-denatured and circularized by the splint oligo sequence.The single-strand circle DNA (ssCir DNA) was formatted as the final library and qualified by QC.Libraries were then subjected to the DNBSEQ-T7 platform for sequencing.To make sure reads reliable, NGS pairedended sequenced Raw reads for genomic survey were first filtered using fastp (v.0.21.0; under default parameters) to remove low-quality reads, adapters, and reads containing poly-N [29].PacBio libraries with insertions > 20 kb were prepared and sequenced using the Sequel II platform.The Hi-C library was prepared using the restriction endonuclease dpnII and subjected to the DNBSEQ-T7 platform for sequencing.All of the sequencing services were provided by the Haorui Genomics Technology Co., Ltd.(Xi'an, China).

Genome size estimation and assembly
Genome size and heterozygosity were estimated by the kmer analysis performed using NGS DNA data prior [30].Briefly, cleaned reads were subjected to 17-mer frequency distribution analysis using the jellyfish v2.2.10 [31].Then, the genome size was estimated with the 17-mer depth distribution from the 350-bp library cleaned sequencing reads in findGSE [32] and GenomeScope 2.0 [33], with the following equation: G = K-num/K-depth (where K-num is the total number of 17-mers, K-depth denotes the k-mer depth, and G represents the genome size).Further combining the simulation data results of Arabidopsis with different heterozygosity and the frequency peak distribution of 17 kmer, the heterozygosity and repeat content of goral genome were estimated.
Through comparative analysis, the unique read pair around the restriction endonuclease site was determined.By standardizing the restriction endonuclease sites on the genome sketch, Hi-C linkage is used as a measure of the degree of correlation between different Scaffolds.For a genome sketch with a karyotype of 2n, the Scaffold sequence of the sketch is clustered into n chromosomal groups using 3D-DNA (v201008) [34], and then visualized through the interaction signals of each chromosome, corrected through heatmaps.Finally, the Scaffold sequence with the determined sequence and direction is added to connect 100 N's to obtain the final genome sequence at the chromosome level.The integrity of the assembly was evaluated by BUSCO (v5.0.0) with the dataset mammalia_odb9 [35].To assess the continuity of the assemblies, the assembled genome was collinearly aligned with the high-quality genomes of goat (Saanen_ v1) and sheep (ARS-UI_Ramb_v2.0)using Minimap2 (v2.17) [36].
The coding sequences of the single-copy families were multiple-aligned using MUSCLE.The GTRGAMMA substitution model of RAxML (v8.2.10) [52] was used for the phylogenetic tree construction with 1000 bootstrap replicates.Based on the phylogenetic tree, MCMCtree (v4.8) [53] was utilized to compute the mean substitution rates along each branch and estimate the species' divergent time.Three fossil calibration times were obtained from the TimeTree database [54], including divergence times of Capra hirus and Ovis aries (6-11.7 MYA).MP-EST [55] was used to construct root-cause and species phylogenetic trees.The branch length of 7.0 indicated that all gene trees support the triple structure of species phylogenetic tree with Pseudo-likelihood scores.

Gene family analysis
The expansion and contraction of orthologous gene families were detected by CAFÉ v4.2.1 (-p 0.05 -t 10 -r 10,000) with a birth and death process to model gene gain and loss over a phylogeny [56].Using R package clusterProfiler [57], both GO and KEGG functional enrichment analyses of the expanded gene families were performed.To identify positively selected genes in the goral lineage, we calculated average Ka/Ks values and conducted the branch-site likelihood ratio test using CodeML implemented in the PAML package Version 4.8 [58].Genes with a p-value < 0.05 under the branch-site model were considered positively selected genes.Heatmap was plotted by online platform https://www.bioinformatics.com.cn [59].

Synteny analysis
Correlated gene arrangements among N. goral, B. taxicolor, and O. moschatus provide a valuable framework for inference of shared ancestry of genes and for the utilization of findings from organism to organism.LAST software (v942) [60] was used to search species-pairwise synteny blocks, and JCVI (v1.2.7) utility libraries in MCscan (Python version) [61] were used to analyze and visualize the synteny between different genomes.

Fig. 1
Fig. 1 Genome analysis of Naemorhedus goral.A: Genome survey using 17-mer analysis.B: Circos plotting from outer to inner: chromosome (Mb), DNA, LINE and simple repeat content, GC content, gene density, goral photoed by camera traps in Makehe, Qinghai province, China.C: Hi-C interaction heatmap

Fig. 2 A
Fig. 2 A: Phylogenetics at the mitochondrial genome level.Naemorhedus species are shown in bold.In other genera, one representative species was used for reconstruction.B: Distributions of N. goral (blue) and N. griseus (pink) assessed in IUCN, with collection localities corresponding to two clades colored in A

Fig. 3
Fig. 3 Orthologous genes and phylogenetic relationship among genomes of N. goral and other nine Bovidae species.A: Venn plot of single-copy genes and specific genes of each species.B: Phylogeny with expanded/contracted gene families and divergence time estimation.C: Collinear relationship of N. goral, B. taxicolor and O. moschatus

Fig. 4
Fig. 4 Enrichment analysis for expanded gene families by KEGG (A) and GO (B), and for positively selected genes by KEGG (C) and GO (D).The first 20 in each class were displayed.Value around each bar indicates number involved in each pathway